Information Classification and Extraction on Official Web Pages of Organizations
نویسندگان
چکیده
منابع مشابه
Grouping Web Pages about Persons and Organizations for Information Extraction
Information extraction on the Web permits users to retrieve specific information related to the query especially on the name of a person or organization. As name is non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into ...
متن کاملPath Set Operations for Clipping of Parts of Web Pages and Information Extraction from Web pages
It is attractive to extract parts of Web pages for the following two purposes. One is to clip parts of Web pages as we clip articles of newspapers. Another is to utilize information on Web pages by software. In this paper we define operations to extract parts of Web pages, namely path set operations. The operations are for both clipping of parts of Web pages and information extraction from Web ...
متن کاملBootstraping Information Extraction Using Regularity of Web Pages
To annotate web documents with metadata automatically, we must prepare a database that stores annotation targets and these metadata. In the case of location information, we need a database that stores many named entities (NEs) and their location information (i.e., telephone number and address). In this paper, we present a bootstrapping approach to extract triples. We describe our extraction met...
متن کاملInformation Extraction from Hypertext Mark-Up Language Web Pages
Problems statement: Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various HTML information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek specific information, the results are not only information about ...
متن کاملAutomatic Information Extraction for Multiple Singular Web Pages
TheWorld WideWeb is now undeniably the richest and most dense source of information, yet its structure makes it diÆcult to make use of that information in a systematic way. This paper extends a pattern discovery approach called IEPAD to the rapid generation of information extractors that can extract structured data from semi-structured Web documents. IEPAD is proposed to automate wrapper genera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers, Materials & Continua
سال: 2020
ISSN: 1546-2226
DOI: 10.32604/cmc.2020.011158